One of the issues faced by script markers marking free text examples against a canned answer.
The following recipe provides a quick and dirty similarity check between a provvded answer and a specimen answer using a small spacy
natural language model, en_core_web_sm
.
Performance of the similarity matcher may be improved by using a larger model (eg the medium size en_core_web_md
model or the full en_core_web_lg
model), but these are rather slower to downlad, install and run.
In an nbgrader context, we can do something like the following in a question cell defined as part of a manually graded test:
# Add your commentary here
#For now, the specimen answer needs to be assigned to a specific variable
#It would make more sense to define an %%assignment_text block magic that lets you write some
# text in a code cell and then let the magic assign this to a variable that can then be automatically tested.
answer_txt = '''
YOUR ANSWER TEXT
'''
and then in the associated test cell assign the specimen answer to ___specimen_txt
:
### BEGIN HIDDEN TESTS
___specimen_txt = '''
As much English wine's consumed as each of Bordeaux red and Burgundy white.
The biggest spike is on the Reception Wines, and we don't really know anything
about the provenance of those wines.
So given the global pre-eminence of French wines, maybe HoC
aren't doing too bad a job of making the case for English
and Welsh.
'''
lang_test(___specimen_txt, answer_txt)
### END HIDDEN TESTS
Once again, it may make more sense to define some magic to handle this, such as %%mark_against_specimen
that could take a block of answer text, use that as the basis of comparison and display or return the result directly.
The marker can then review the automated marking support similarity grid, assign an appropriate manual mark.
It might also be worth considering whether similarity marks should be fed back to the student or used in support of generating (or at least drafting) canned feedback.
Import the necessary model (note: spacy
is required for the similarity checker and pandas
used, overkill style, to display the result).
#!pip install spacy
#!pip install pandas
import spacy
try:
import en_core_web_sm
except:
import spacy.cli
spacy.cli.download("en_core_web_sm")
import en_core_web_sm
import warnings
from spacy.errors import ModelsWarning
warnings.filterwarnings("ignore",category=ModelsWarning)
import pandas as pd
#Loading the model may take some time
nlp = en_core_web_sm.load()
Generate a couple of reports:
We can add a limit value to the marker so that only sentences with similarity matches between supplied and specimen answer sentence exceeding a minimum value are displayed. (This can mean where a supplied answer is spread over sever sentences compared to the specimen answer
from IPython.display import HTML
def lang_test(__specimen, answer_txt, threshold = 0, show_side_by_side=True, retval=False):
___specimen = nlp(___specimen_txt.replace('\n',' '))
___answer = nlp(answer_txt.replace('\n',' '))
# It doesn't matter which order we present the parsed sentences for similarity scoring
display(HTML('<h3>Similarity score overall</h3><p>Overall similarity score: {}</p>'.format(___specimen.similarity(___answer))))
# Another approach may be to split answer into sentences,
# and the compare each sentence against each sentence in the specimen,
# perhaps retaining the best match score for each answer sentence?
___sentence_scores = pd.DataFrame(columns=['Answer', 'Similarity', 'Specimen',
'Answer (no stopwords)', 'Specimen (no stopwords)'])
___sentence_full_scores = pd.DataFrame(columns=['Answer', 'Similarity Overall'])
for ___answer_sent in ___answer.sents:
___answer_sent_nostop = nlp(' '.join([token.text for token in ___answer_sent
if not token.is_stop and not token.is_punct]))
___sentence_full_scores = pd.concat([___sentence_full_scores,
pd.DataFrame({'Answer':[___answer_sent.text],
'Similarity Overall':[___specimen.similarity(___answer_sent_nostop)]
})])
for ___specimen_sent in ___specimen.sents:
___specimen_sent_nostop = nlp(' '.join([token.text for token in ___specimen_sent
if not token.is_stop and not token.is_punct]))
#print('\nComparing:', ___answer_sent, '\nvs.\n', ___specimen_sent,
# '\nMark:\t',___specimen_sent.similarity(___answer_sent) )
___sentence_scores = pd.concat([___sentence_scores,
pd.DataFrame({'Answer':[___answer_sent.text],
'Similarity': [___specimen_sent_nostop.similarity(___answer_sent_nostop)],
'Specimen': [___specimen_sent.text],
'Answer (no stopwords)': [___answer_sent_nostop.text],
'Specimen (no stopwords)': [___specimen_sent_nostop.text]
})])
___pd_default_colwidth = pd.get_option('display.max_colwidth')
pd.set_option('display.max_colwidth', -1)
if show_side_by_side:
display(HTML('''<table><thead><td>Provided</td><td>Specimen</td></thead>
<tr><td>{}</td><td>{}</td></tr></table>'''.format(answer_txt,__specimen)))
display(HTML('<h3>Sentence level similarity with full specimen answer</h3>'))
display(___sentence_full_scores[___sentence_full_scores['Similarity Overall']>threshold])
display(HTML('<h3>Sentence level matching</h3>'))
___sentence_scores.set_index(['Answer', 'Specimen'], inplace=True)
display(___sentence_scores[___sentence_scores['Similarity']>threshold])
pd.set_option('display.max_colwidth', ___pd_default_colwidth)
if retval:
return pd
answer_txt = '''
Reception wine was the most consumed, but we donlt know where that comes from.
English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux.
'''
___specimen_txt = '''
As much English wine's consumed as each of Bordeaux red and Burgundy white.
The biggest spike is on the Reception Wines, and we don't really know anything
about the provenance of those wines.
So given the global pre-eminence of French wines, maybe HoC
aren't doing too bad a job of making the case for English
and Welsh.
'''
lang_test(___specimen_txt, answer_txt)
Overall similarity score: 0.81345909359606
Provided | Specimen |
Reception wine was the most consumed, but we donlt know where that comes from. English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. | As much English wine's consumed as each of Bordeaux red and Burgundy white. The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. |
Answer | Similarity Overall | |
---|---|---|
0 | Reception wine was the most consumed, but we donlt know where that comes from. | 0.527520 |
0 | English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. | 0.558373 |
Similarity | Answer (no stopwords) | Specimen (no stopwords) | ||
---|---|---|---|---|
Answer | Specimen | |||
Reception wine was the most consumed, but we donlt know where that comes from. | As much English wine's consumed as each of Bordeaux red and Burgundy white. | 0.656954 | Reception wine consumed donlt know comes | English wine consumed Bordeaux red Burgundy white |
The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. | 0.637424 | Reception wine consumed donlt know comes | biggest spike Reception Wines know provenance wines | |
So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. | 0.722088 | Reception wine consumed donlt know comes | given global pre eminence French wines maybe HoC bad job making case English Welsh | |
English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. | As much English wine's consumed as each of Bordeaux red and Burgundy white. | 0.835747 | English Welsh wines consumed followed white Burgundy red Bordeaux | English wine consumed Bordeaux red Burgundy white |
The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. | 0.443787 | English Welsh wines consumed followed white Burgundy red Bordeaux | biggest spike Reception Wines know provenance wines | |
So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. | 0.671232 | English Welsh wines consumed followed white Burgundy red Bordeaux | given global pre eminence French wines maybe HoC bad job making case English Welsh |
lang_test(___specimen_txt, answer_txt, 0.7)
Overall similarity score: 0.81345909359606
Provided | Specimen |
Reception wine was the most consumed, but we donlt know where that comes from. English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. | As much English wine's consumed as each of Bordeaux red and Burgundy white. The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. |
Answer | Similarity Overall |
---|
Similarity | Answer (no stopwords) | Specimen (no stopwords) | ||
---|---|---|---|---|
Answer | Specimen | |||
Reception wine was the most consumed, but we donlt know where that comes from. | So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. | 0.722088 | Reception wine consumed donlt know comes | given global pre eminence French wines maybe HoC bad job making case English Welsh |
English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. | As much English wine's consumed as each of Bordeaux red and Burgundy white. | 0.835747 | English Welsh wines consumed followed white Burgundy red Bordeaux | English wine consumed Bordeaux red Burgundy white |